Goto

Collaborating Authors

 fifa world cup


Mexico City's 'Xoli' Chatbot Will Help World Cup Tourists Navigate the City

WIRED

The launch of "Xoli" adds to the technological efforts promoted by the federal government to turn the 2026 World Cup into an engine of development for the entire country. Xoli, the new chatbot, is named after the axolotl, a salamander with external gills. The Government of Mexico City has launched Xoli, a chatbot that will provide information on services, tourism, and cultural offerings. The platform was designed to meet the demand of the millions of visitors expected to arrive during the 2026 FIFA World Cup . However, the authorities assure that the tool will remain active once the sporting event is over, with the aim of promoting economic activities and facilitating access to public services in the capital.


Mexico Preps for the 2026 World Cup With a Ticket Resale Platform and a Tourism App

WIRED

Mexico's consumer protection agency and FIFA are working on a "ticket relocation system" that will allow those with extra World Cup tickets to sell them safely and at appropriate prices. The Mexican government has presented its strategy to turn this summer's World Cup soccer tournament into an engine to strengthen trade, sports, tourism, and culture in the country where most of the games will be hosted. The Mexico 2026 Social World Cup project includes cultural events like soccer matches between robots, a public transit plan, and a new app where fans can sell securely sell any tickets they can't use. During a conference last week, Mexican President Claudia Sheinbaum stated that the intention is "to leave a sporting legacy in our country that goes beyond the competition itself." "[In this World Cup ] the eyes of the world will be here," Sheinbaum said, "and what they will see is a great country with an enormous cultural heritage. They will see that we are building a nation that is fairer, freer, and more democratic."


Retracing the Past: LLMs Emit Training Data When They Get Lost

Ko, Myeongseob, Billa, Nikhil Reddy, Nguyen, Adam, Fleming, Charles, Jin, Ming, Jia, Ruoxi

arXiv.org Artificial Intelligence

The memorization of training data in large language models (LLMs) poses significant privacy and copyright concerns. Existing data extraction methods, particularly heuristic-based divergence attacks, often exhibit limited success and offer limited insight into the fundamental drivers of memorization leakage. This paper introduces Confusion-Inducing Attacks (CIA), a principled framework for extracting memorized data by systematically maximizing model uncertainty. We empirically demonstrate that the emission of memorized text during divergence is preceded by a sustained spike in token-level prediction entropy. CIA leverages this insight by optimizing input snippets to deliberately induce this consecutive high-entropy state. For aligned LLMs, we further propose Mismatched Supervised Fine-tuning (SFT) to simultaneously weaken their alignment and induce targeted confusion, thereby increasing susceptibility to our attacks. Experiments on various unaligned and aligned LLMs demonstrate that our proposed attacks outperform existing baselines in extracting verbatim and near-verbatim training data without requiring prior knowledge of the training data. Our findings highlight persistent memorization risks across various LLMs and offer a more systematic method for assessing these vulnerabilities.


MageBench: Bridging Large Multimodal Models to Agents

Zhang, Miaosen, Dai, Qi, Yang, Yifan, Bao, Jianmin, Chen, Dongdong, Qiu, Kai, Luo, Chong, Geng, Xin, Guo, Baining

arXiv.org Artificial Intelligence

LMMs have shown impressive visual understanding capabilities, with the potential to be applied in agents, which demand strong reasoning and planning abilities. Nevertheless, existing benchmarks mostly assess their reasoning abilities in language part, where the chain-of-thought is entirely composed of text.We consider the scenario where visual signals are continuously updated and required along the decision making process. Such vision-in-the-chain reasoning paradigm is more aligned with the needs of multimodal agents, while being rarely evaluated. In this paper, we introduce MageBench, a reasoning capability oriented multimodal agent benchmark that, while having light-weight environments, poses significant reasoning challenges and holds substantial practical value. This benchmark currently includes three types of environments: WebUI, Sokoban, and Football, comprising a total of 483 different scenarios. It thoroughly validates the agent's knowledge and engineering capabilities, visual intelligence, and interaction skills. The results show that only a few product-level models are better than random acting, and all of them are far inferior to human-level. More specifically, we found current models severely lack the ability to modify their planning based on visual feedback, as well as visual imagination, interleaved image-text long context handling, and other abilities. We hope that our work will provide optimization directions for LMM from the perspective of being an agent. We release our code and data at https://github.com/microsoft/MageBench.


OneLove beyond the field -- A few-shot pipeline for topic and sentiment analysis during the FIFA World Cup in Qatar

Rauchegger, Christoph, Wang, Sonja Mei, Delobelle, Pieter

arXiv.org Artificial Intelligence

The FIFA World Cup in Qatar was discussed extensively in the news and on social media. Due to news reports with allegations of human rights violations, there were calls to boycott it. Wearing a OneLove armband was part of a planned protest activity. Controversy around the armband arose when FIFA threatened to sanction captains who wear it. To understand what topics Twitter users Tweeted about and what the opinion of German Twitter users was towards the OneLove armband, we performed an analysis of German Tweets published during the World Cup using in-context learning with LLMs. We validated the labels on human annotations. We found that Twitter users initially discussed the armband's impact, LGBT rights, and politics; after the ban, the conversation shifted towards politics in sports in general, accompanied by a subtle shift in sentiment towards neutrality. Our evaluation serves as a framework for future research to explore the impact of sports activism and evolving public sentiment. This is especially useful in settings where labeling datasets for specific opinions is unfeasible, such as when events are unfolding.


DefAn: Definitive Answer Dataset for LLMs Hallucination Evaluation

Rahman, A B M Ashikur, Anwar, Saeed, Usman, Muhammad, Mian, Ajmal

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have demonstrated remarkable capabilities, revolutionizing the integration of AI in daily life applications. However, they are prone to hallucinations, generating claims that contradict established facts, deviating from prompts, and producing inconsistent responses when the same prompt is presented multiple times. Addressing these issues is challenging due to the lack of comprehensive and easily assessable benchmark datasets. Most existing datasets are small and rely on multiple-choice questions, which are inadequate for evaluating the generative prowess of LLMs. To measure hallucination in LLMs, this paper introduces a comprehensive benchmark dataset comprising over 75,000 prompts across eight domains. These prompts are designed to elicit definitive, concise, and informative answers. The dataset is divided into two segments: one publicly available for testing and assessing LLM performance and a hidden segment for benchmarking various LLMs. In our experiments, we tested six LLMs-GPT-3.5, LLama 2, LLama 3, Gemini, Mixtral, and Zephyr-revealing that overall factual hallucination ranges from 59% to 82% on the public dataset and 57% to 76% in the hidden benchmark. Prompt misalignment hallucination ranges from 6% to 95% in the public dataset and 17% to 94% in the hidden counterpart. Average consistency ranges from 21% to 61% and 22% to 63%, respectively. Domain-wise analysis shows that LLM performance significantly deteriorates when asked for specific numeric information while performing moderately with person, location, and date queries. Our dataset demonstrates its efficacy and serves as a comprehensive benchmark for LLM performance evaluation. Our dataset and LLMs responses are available at \href{https://github.com/ashikiut/DefAn}{https://github.com/ashikiut/DefAn}.


Chaos with Keywords: Exposing Large Language Models Sycophancy to Misleading Keywords and Evaluating Defense Strategies

RRV, Aswin, Tyagi, Nemika, Uddin, Md Nayem, Varshney, Neeraj, Baral, Chitta

arXiv.org Artificial Intelligence

This study explores the sycophantic tendencies of Large Language Models (LLMs), where these models tend to provide answers that match what users want to hear, even if they are not entirely correct. The motivation behind this exploration stems from the common behavior observed in individuals searching the internet for facts with partial or misleading knowledge. Similar to using web search engines, users may recall fragments of misleading keywords and submit them to an LLM, hoping for a comprehensive response. Our empirical analysis of several LLMs shows the potential danger of these models amplifying misinformation when presented with misleading keywords. Additionally, we thoroughly assess four existing hallucination mitigation strategies to reduce LLMs sycophantic behavior. Our experiments demonstrate the effectiveness of these strategies for generating factually correct statements. Furthermore, our analyses delve into knowledge-probing experiments on factual keywords and different categories of sycophancy mitigation.


Injecting New Knowledge into Large Language Models via Supervised Fine-Tuning

Mecklenburg, Nick, Lin, Yiyou, Li, Xiaoxiao, Holstein, Daniel, Nunes, Leonardo, Malvar, Sara, Silva, Bruno, Chandra, Ranveer, Aski, Vijay, Yannam, Pavan Kumar Reddy, Aktas, Tolga, Hendry, Todd

arXiv.org Artificial Intelligence

In recent years, Large Language Models (LLMs) have shown remarkable performance in generating human-like text, proving to be a valuable asset across various applications. However, adapting these models to incorporate new, out-of-domain knowledge remains a challenge, particularly for facts and events that occur after the model's knowledge cutoff date. This paper investigates the effectiveness of Supervised Fine-Tuning (SFT) as a method for knowledge injection in LLMs, specifically focusing on the domain of recent sporting events. We compare different dataset generation strategies -- token-based and fact-based scaling -- to create training data that helps the model learn new information. Our experiments on GPT-4 demonstrate that while token-based scaling can lead to improvements in Q&A accuracy, it may not provide uniform coverage of new knowledge. Fact-based scaling, on the other hand, offers a more systematic approach to ensure even coverage across all facts. We present a novel dataset generation process that leads to more effective knowledge ingestion through SFT, and our results show considerable performance improvements in Q&A tasks related to out-of-domain knowledge. This study contributes to the understanding of domain adaptation for LLMs and highlights the potential of SFT in enhancing the factuality of LLM responses in specific knowledge domains.


AI's Impact on the 2022 FIFA World Cup

#artificialintelligence

Artificial intelligence (AI) has the potential to impact many areas of our lives, including sports. The question of whether AI has affected the FIFA 2022 World Cup is an interesting one, and there are a number of ways in which AI may have influenced the event. One potential impact of AI on the FIFA 2022 World Cup is in the realm of player performance analysis. AI can be used to analyze vast amounts of data on players' movements, actions, and performance on the pitch, providing valuable insights for coaches and teams. This data can be collected using sensors and tracking systems that are embedded in players' jerseys or placed around the pitch.


Visualising the FIFA World Cup final

Al Jazeera

On Sunday, December 18, on the pitch of Lusail Stadium in Qatar, Argentina will take on 2018 defending champions France for football's most coveted trophy. The FIFA World Cup, now in its 22nd edition, has been held every four years since 1930, except in 1942 and 1946 because of World War II. Over its 92-year history, 79 nations have battled it out for the top prize. Of these, 13 countries have made it to the finals, with eight being crowned champions. Only European and South American teams have ever reached the finals.

  Country:
  Industry: Leisure & Entertainment > Sports > Soccer (1.00)